Overview


A freshly emerged cicada on a flower.
A cicada that emerged in my garden.

In the spring of 2021, the Brood X cicadas emerged throughout much of the Mid-Atlantic region. Brood X are periodical cicadas that spend most of their lives underground, emerge in the spring every 17 years, breed, and then die. In their adult form, they produce an incredibly loud song to attract mates.

Our team at the Smithsonian deployed automated sound recording units (ARUs) throughout the District of Columbia, Maryland, and Virginia. Our goal for this study was to determine how bird song may vary in response to cicada noise along a gradient of anthropogenic noise and urban intensity.

In this exercise, you will explore cicada observations collected by iNaturalist community science participants. Early reports by iNaturalist observers helped us to determine the appropriate locations to place our ARUs.

Grading


The points allotted for each question are provided in highlighted red bold text (e.g., [1.0]) within the question itself. When applicable, total points for a question may represent the sum of individually graded components, which are provided in red text (e.g., [1.0]).

Points may be deducted from each question’s total:

Note: The maximum deduction is the total points value for a given question

In addition to points allotted per question, you must ensure that your R Markdown document runs out-of-the-box [25% off of the total grade] – in other words, the document will knit without error.

In this assignment, you may use only the following R functions (Note: If you are unclear on what a given function does, use ? to view the help file!):

  • base:::
  • base::::
  • base::()
  • base::<-
  • base::=
  • base::==
  • base::!
  • base::!=
  • base::%in%
  • base::c
  • base::library
  • base::min
  • base::names
  • base::rm
  • base::tolower
  • dplyr::arrange
  • dplyr::filter
  • dplyr::full_join
  • dplyr::inner_join
  • dplyr::mutate
  • dplyr::n
  • dplyr::select
  • dplyr::summarize
  • ggplot2::aes
  • ggplot2::geom_sf
  • ggplot2::ggplot
  • ggplot2::scale_fill_viridis_c
  • ggplot2::theme_void
  • ggplot2::theme_set
  • kableExtra::kable
  • kableExtra::kable_styling
  • magrittr::%>%
  • readr::read_csv
  • rlang::set_names
  • sf::st_as_sf
  • sf::st_crs
  • sf::st_filter
  • sf::st_join
  • sf::st_read
  • sf::st_transform
  • tibble::as_tibble

Note: The packages dplyr, ggplot2, magrittr, readr, rlang, and tibble are all part of the tidyverse and are loaded with library(tidyverse).

Getting started


As always, please ensure that you are starting with a clean session!

1. [0.5] Save and knit this document:

  • [0.1] Replace my name in the YAML header with yours
  • [0.1] Add the current date in the YAML header
  • [0.3] Save the .rmd file in the output folder of your project as (but replace my name with yours): problem_set_3_Evans_Brian.rmd

All of my scripts start with a setup section where I load libraries, set session options, conduct initial data loading, and early processing steps.

2. [0.25] Load the sf and tidyverse libraries.

library(tidyverse)
library(sf)


3. [0.5] Set the theme of all of the plots in this document to theme_void().

theme_set(theme_void())


4. [1.0] Read in and process the file cicadas_brood_x_2021.csv. In doing so:

  • [0.25] Load the data as a tibble;
  • [0.25] Subset the data to where the quality grade is classified as “research”;
  • [0.25] Subset the data to the fields datetime, scientific_name, user, longitude, and latitude;
  • [0.25] Globally assign the resultant object with the name cicadas_temp.
cicadas_temp <-
  read_csv("data/raw/cicadas_brood_x_2021.csv") %>%
  filter(quality_grade == "research") %>%
  select(datetime:user,
         longitude,
         latitude)

About the cicada observation data

The cicadas_temp file represents research-grade iNaturalist observations of Brood X periodical cicadas from February through September of 2021. These data contain the following fields:

  • datetime: The date and time of the observation, formatted as an ISO 8601 (yyyy-mm-dd hh:mm:ss) datetime object and reported in UTC;
  • scientific_name: The genus and species of the observed cicada;
  • user: The user’s login name;
  • place_guess: Best guess for the location of the observation;
  • longitude: The longitude of an observation, recorded in EPSG 4326;
  • latitude: The latitude of an observation, recorded in EPSG 4326;
  • image_url: The address of the image online;
  • quality_grade: Whether the species identification can be trusted, based on expert opinion – options include “casual”, “needs_id”, and “research” (verified by experts);
  • description: Notes that users may submit with their observations.

5. [1.25] Read and processes the file counties.geojson (or counties_low_res.geojson if you have memory issues!). In doing so:

  • [0.25] Read in the data as a simple features shapefile;
  • [0.25] Convert all field names to lowercase;
  • [0.25] Subset to the columns geoid, name, and state_name (in that order);
  • [0.25] Transform the Coordinate Reference System (CRS) of the object to EPSG 2283 (a CRS often used for Virginia);
  • [0.25] Globally assign the resultant object with the name counties_temp.
counties_temp <-
  st_read("data/raw//shapefiles/counties_low_res.geojson") %>%
  set_names(names(.) %>%
              tolower()) %>%
  select(geoid,
         name,
         state_name) %>%
  st_transform(crs = 2283)
## Reading layer `counties_small' from data source 
##   `C:\Users\User\Documents\week4\data\raw\shapefiles\counties_low_res.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 3221 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS:  NAD83

About the county data

The counties_temp file is an sf multipolygon shapefile where each feature represents a county in the United States. These data contain the following fields:

  • geoid: A primary key associated with each shape;
  • name: The county name;
  • state_name: The name of the US state where the county is located.

Subsetting data


6. [1.0] Brood X cicada are comprised of three species, Magicicada cassinii, M. septendecim, and M. septendecula. A large proportion of the observations (about 60%) could not be identified to species (you can explore this with group_by() and summarize()) and one of the observations was of a 13-year periodical cicada, M. tredecim (Note: This species is not a Brood X cicada!).

  • [0.50] Subset the data to Brood X cicada that could be identified to species;
  • [0.25] Assign the object to your global environment with the name cicadas_brood_x;
  • [0.25] Remove cicadas_temp from your global environment (in a separate code block).
# cicadas_brood_x %>% 
#   group_by(scientific_name) %>% 
#   summarize(n = n())

cicadas_brood_x <-
  cicadas_temp %>%
  filter(
    scientific_name %in%
      c(
        "Magicicada cassinii",
        "Magicicada septendecim",
        "Magicicada septendecula"
      )
  )

rm(cicadas_temp)

When working with spatial data, it’s often necessary to subset the spatial extent of your files to your study area. This will save a lot of processing time (and, potentially, help you avoid out-of-memory errors!).

7. [1.0] Subset counties_temp to counties in the District of Columbia, Maryland, or Virginia [0.50], globally assign the resultant object with the name counties [0.25], and in a separate code block, remove counties_temp from your global environment [0.25].

counties <-
  counties_temp %>%
  filter(state_name %in% c("District of Columbia",
                           "Maryland",
                           "Virginia"))

rm(counties_temp)


8. [1.5] Convert cicadas_brood_x to an sf point file [0.25] and subset to our area of interest. In doing so:

  • [0.25] Transform the CRS of the object to the same projection as counties;
  • [0.50] Subset the data to observations in the District of Columbia, Maryland, or Virginia;
  • [0.25] Globally assign the resultant object with the name cicadas_sf;
  • [0.25] Remove cicadas_brood_x from your global environment (in a separate code block).
cicadas_sf <-
  cicadas_brood_x %>%
  st_as_sf(coords = c('longitude', 'latitude'),
           crs = 4326) %>%
  st_transform(st_crs(counties)) %>%
  st_join(counties) %>%
  filter(!is.na(state_name))

rm(cicadas_brood_x)

Data exploration


In evaluating the distributions of Brood X for our experiment, we were interested in when each species of cicada emerged and where the most cicadas were observed.

9. [1.5] Generate a kable [0.25] that displays the datetime that each species of cicada [0.50] was first observed in each state [0.50]. Please arrange the table from earliest to latest datetime [0.50] (Note: The columns of the summary table should be state_name, scientific_name, and datetime).

cicadas_sf %>%
  as_tibble() %>%
  group_by(state_name,
           scientific_name) %>%
  summarize(datetime = min(datetime)) %>%
  arrange(datetime) %>%
  knitr::kable()
state_name scientific_name datetime
District of Columbia Magicicada septendecim 2021-05-03 15:56:30
Virginia Magicicada septendecim 2021-05-12 10:25:33
Maryland Magicicada septendecim 2021-05-14 08:16:27
Virginia Magicicada cassinii 2021-05-15 13:45:17
Maryland Magicicada cassinii 2021-05-16 12:35:20
Virginia Magicicada septendecula 2021-05-17 13:45:23
Maryland Magicicada septendecula 2021-05-18 12:56:55
District of Columbia Magicicada cassinii 2021-05-19 11:36:08
District of Columbia Magicicada septendecula 2021-05-21 10:52:00


10. [1.5] Generate a choropleth map of counties [0.50], where:

  • [0.50] The fill color is determined by the number of cicadas that were observed in each county;
  • [0.50] Counties with no observations are colored light gray.
cicadas_sf %>%
  as_tibble() %>%
  group_by(name) %>%
  summarize(n = n()) %>%
  full_join(counties,
            .,
            by = 'name') %>%
  ggplot() +
  geom_sf(data = counties) +
  geom_sf(aes(fill = n)) +
  scale_fill_viridis_c(option = "plasma",
                       na.value = "#dcdcdc")